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© Simplified synchronous mesh processor. 



© A mesh processor array (10) includes a plurality of one-bit processor cells arranged in a matrix (30). Each 
processor receives inputs from adjacent processors or external sources and performs a logical function involving 
its own present state and the inputs thereto. Control circuitry (20) provides control information indicative of a 
logical function to be performed to each of the processors in parallel, and pattern selection circuitry (40, 50) 
enables selected ones of the processors to respond to the control information. 
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SIMPLIFIED SYNCHRONOUS MESH PROCESSOR 



BACKGROUND OF THE INVENTION 



The subject invention is directed generally to mesh processing arrays, and is more specifically directed 
5 to a one-bit mesh processor and a mesh processor array architecture that utilizes the one-bit processor. 

A mesh processing array is a form of parallel processing wherein generally identical mesh processors 
are interconnected in a grid-like fashion, for example, in rows and columns. Each processor is coupled to 
processors adjacent thereto (e.g., a maximum of four in a row and column configuration) with data 
input/outputs being provided via the processors on the periphery of the grid array. Commonly, the 
10 processors receive control signals (e.g., control words or op-codes) in parallel and are clocked in parallel. 

Examples of known mesh processor arrays include the NCR 45CG72 array processor and the AMT 
DAP array processor. 

An important consideration with some known mesh processors arrays is the allocation of dedicated 
storage (memory) per processor cell which is typically not sufficiently large (e.g., 128 bits) except for few 
75 applications. Greater memory requirements are met by the use of a virtual processor cell comprising a 
plurality of real processor cells, which generally results in wasted memory since the virtual cell memory is 
an integral multiple of the real cell memory size. 

A further consideration with known mesh processor arrays is the use of special function units or other 
special hardware which is utilized only part of the time, and therefore is not efficiently utilized. 
20 As a result of large memories and special hardware, known processor arrays are quite large and cannot 
be operated at high clock rates. 



SUMMARY OF THE INVENTION 

25 

It would therefore be an advantage to provide a mesh processor that is not complex and is efficiently 
utilized in a mesh processor array. 

Another advantage would be to provide a mesh processor and array which can be clocked at a high 
30 rate. 

A further advantage would be to provide a mesh processor and array which provide computational 
flexibility. 

Another advantage would be to provide a mesh processor and array which provide for efficient memory 
utilization. 

ss The foregoing and other advantages are provided by the invention in a mesh processor array which 
includes a plurality of one-bit processor cells arranged in a matrix. Each processor receives inputs from 
adjacent processors or from external sources and performs a logical function involving its own present 
output and the inputs thereto. Control circuitry provides control information indicative of a logical function to 
be performed to the each of the processors in parallel, and selection circuitry enables selected ones of the 

40 processors to respond to the control information. 



BRIEF DESCRIPTION OF THE DRAWING 

45 

The advantages and features of the disclosed invention will readily be appreciated by persons skilled in 
the art from the following detailed description when read in conjunction with the drawing wherein: 
FIG. 1 is a block diagram of a mesh processor array in accordance with the invention. 
FIG. 2 is a block diagram showing the interconnection of the processors of the mesh processor array of 
so RG. 1. 

FIG. 3 is a generalized circuit schematic of mesh processor in accordance with the invention. 

RG. 4 is a circuit schematic of a specific implementation of the mesh processor of RG. 3. 

RG. 5 is a circuit schematic of a specific implementation of the multiplexers of the circuit of RG. 4. 

RGS. 6A through 6M schematically illustrate a specific example of the process for modulo 8 addition 

with a mesh processor array which includes processors as illustrated in RGS. 4 and 5. 
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DETAILED DESCRIPTION 



In the following detailed description and in the several figures of the drawing, like elements are 

5 identified with like reference numerals. 

Referring now to FIG. 1, shown therein is a block diagram of a mesh processor array 10 that includes a 
controller 20 for controlling the operation of a processor array 30 that includes one-cell processors arranged 
in a grid of M columns by N rows. The controller 20 provides a K-bit op-code INST to each of the 
processors of the array 30. The controller 20 further provides a column pattern word CSELECT to a column 

10 select circuit 40, and provides a row pattern word RSELECT to a row select circuit 50. The output(s) OUT of 
L predetermined processors can be provided to the controller 20, where L is zero or greater. Such outputs 
are advantageously utilized with data dependent algorithms to control the contents of the opcode INST. 

The column select circuit 40 provides M one-bit column select outputs C,, each of which is coupled to 
all of the processors of the i th column. The row select circuit 50 provides N one-bit row select outputs Rj, 

is each of which is coupled to all of the processors of the j th row. By way of illustrative example, the column 
pattern word CSELECT identifies which of the column select outputs Ci are active, while the row pattern 
word RSELECT identifies which of the row select outputs Rj are active. It should be appreciated that the 
column select circuit 40 and the row select circuit 50 can be configured to include internal memory for 
storing the current states of the column and row patterns to provide other processor addressing procedures 

20 which can be based on the stored pattern information. 

As more specifically shown in FIG. 2, the processor array 30 comprises MxN one-cell processors Py, 
wherein each processor Py provides one data output and can receive up to four (4) data inputs at the 
inputs labelled N, S, E, W, which refer to the compass references north, south, east, west that provide 
convenient references as to the origination of the inputs. The input at N is from above the processor, the 

25 input at S is from below, the input at E is from the right and the input at W is from the left 

Each processor is configured to perform logical functions involving the present output of the processor 
and/or any or all of the inputs to the processor. The operands and the logical function would be defined by 
the op-code INST. 

More particularly as to the inputs to the respective processors, each processor other than those on the 

30 perimeter of the array receives as its four (4) inputs the outputs from its four (4) orthogonally adjacent 
processors. Each processor on the perimeter of the array but not at the comers receives three inputs from 
the respective outputs of the three (3) orthogonally adjacent processors, and further can receive an external 
input The processors at the comers of the array receive two (2) inputs from the respective outputs of the 
two orthogonally adjacent processors, and further can receive two external inputs. 

35 The external inputs can be provided to the processors on the perimeter of the array along the the north, 
south, east and west edges. The inputs along such edges are identified as N h Si, Ej, Wj wherein i = 1, M 
and j = 1, N. As defined above, there are M columns and N rows of processors. The external inputs are 
conveniently made available by input registers NR, SR, ER, WR, respectively associated with the N, S, E, 
W edges of the array and schematically depicted in FIG. 1. 

40 By identifying external inputs to the array with the letter S and subscripts consistent with the 
designation of the outputs Sy of the processors Py (i.e., treating the external inputs as if they were outputs 
of an additional column or row of processors), the inputs to the array can be defined as follows: 
North: N| = Sy, where i = 1, M and j = N + 1 
South: Si = Sy, where i = 1, M and j = 0 

45 East Ej = Sy, where i = N + 1 an j = 1, N 
West Wj = Sy, where i = 0 and j = 1, N 

The output Sy of each processor Py can be coupled up to four locations, namely as inputs to any 
orthogonally adjacent processor or as an external output Thus, the output of each processor other than 
those on the perimeter of the processor array is provided as an input to each of the four (4) orthogonally 

50 adjacent processors. The output of each processor on the perimeter but not at the comers is provided as an 
input to each of the three (3) orthogonally adjacent processors and is available as an external output The 
output of each processor at the comers of the array is provided as an input to each of the two (2) 
orthogonally adjacent processors and is available as two external outputs. 

In terms of the compass references being utilized, the external outputs are provided by the processors 

55 along the north, south, east and west edges of the array, and are respectively identified at NOUTj, SOUT,, 
EOLTTj, and WOUTj, wherein i = 1, M and j = 1, N. As defined above, there are M columns and N rows of 
processors. The external outputs are conveniently provided to output registers NOUTR, SOLTTR, EOUTR, 
WOUTR, respectively associated with the N, S, E, W edges of the processor array. 
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ft should be noted that for ease of reference, the outputs at the comers of the processor array are the 

same. Thus, for example, NOUT M is identical to EOUT N since both are provided by the processor Pay* The 

processor array outputs could be organized differently, but this organization maintains consistency with the 

column and row organization. 
5 Since the outputs of the processor array are outputs of processors at the edges of the processor array, 

the outputs of the array can be denoted as follows: 

North: NOUTi = Sy, where i = 1. M and j = N 

South: SOUTi = Sy, where i = 1, M and j = 1 

East EOUTj = Sy, where i = M and j = 1, N 
70 West WOUTj = Sy, where i = 1 and j = 1, N 

ft is noted thai although inputs to the processor array can be provided at ail four edges and outputs 

from the processor array are available at all four edges, not all available inputs and outputs need be utilized. 

For example, a single input register and a single output register might utilized, such as the input register 

NR for inputs to the processors along the north edge and the output register SOUTR for outputs along the 
75 south edge. The discussion of inputs and outputs along each edge is to illustrate the general architecture of 

the mesh processor array. 

As further shown in FIG. 2, each processor Py Includes a column select input C for receiving the 
column select signai C| and a row select input R for receiving the row select signal Rj. As discussed above, 
the column select signals C ( and the row select signals R$ are respectively provided by the column select 

20 circuit 40 and the row select circuit 50. Each processor also includes a K-bit wide input I for receiving the 
K-bit op-code INST from the controller 20. 

In operation, the processors of the array operate synchronously in parallel, with the clocking being 
provided by the column and row select signals which also determine which processors are active in a given 
clock cycle. Specifically, a processor P y is active or selected if the column and row selected C t and Rj are 

25 both active. If a processor Py is active, the state of its one-bit output Sy could change, depending on the 
op-code word INST; otherwise, the state of its output does not change. 

As indicated previously, each processor is configured to perform a logical function involving the present 
output of the processor and/or any or all of the inputs to the processor. An illustrative example which will 
now be discussed is a processor that can perform a 2-operand logical operation involving the present state 

30 of the processor and a selected input 

In the illustrative example of a 2-operand processor, the op-code word INST defines (a) which of the 
inputs to the processor will be used as the second operand in a logical operation having the present state of 
the processor output as the first operand, and (b) the logical operation to be performed. It should be 
appreciated that the logical operation is performed on the present states of the inputs and the output of a 

35 given processor Py. Since each processor receives four (4) one-bit data inputs, a 2-bit direction field in the 
op-code word INST is utilized to define which of the data inputs is the second operand. The remaining 
portion of the op-code word INST comprises an operation field which defines the logical operation to be 
performed. For example, a 4-bit operation field (i.e., K = 6) can define 16 logical operations. By way of 
specific example, the first two bits 11 , 12 of the op-code comprise the direction field, while the remaining 

40 four bits b, U, Is. Is comprise the operation field. 

For the illustrative example of a 2-bit direction field and a 4-brt operation field, the following Table I 
identifies the input selected as the second operand for a selected processor Py pursuant to the values of 
the direction field wherein b is the LSB and I1 is the MSB. Table I specifically identifies the selected input 
by processor input (N, S, E, W) and also by location in the array from where the input originates relative to 

45 Py. As discussed above, the input selected can be an external input 

TABLE I 



Direction 


Input 


Source of 


Held 


Selected 


Input 


00 


E 


Si+ij 


01 


N 


Sy+i 


10 


W 


Sy-i 


11 


S 





The following Table II identifies illustrative logical operations represented by the different values of the 
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operation field of the op-code, where the input to the processor selected as the second operand is identified 
as B, Is Is the LSB, and b is the MSB. 

TABLE II 

5 



Operation 


Logical 


Description 


Reld 


Operation 




0000 


FALSE 


CLEAR 


0001 


Sy AND B 


AND 


0010 


Sij ANDH 


AND NOT 


0011 


Su 


NOP 


0100 


Sy AND B 


NOT AND 


0101 


B 


COPY (MOVE) 


0110 


SyXORB 


XOR 


0111 


Sy OR B 


OR 


1000 


Sij NOR B 


NOR 


1001 


Sij = B 


EQV 


1010 


B 


COPY INVERSE 


1011 


Sy ORB 


OR NOT 


1100 




INVERT 


1101 


SyORB 


NOT OR 


1110 


Sy NANDB 


NAND 


1111 


TRUE 


SET 


(XOR denotes the exclusive OR function) 



Based on the foregoing, the new outputs s'y of each active or selected processor Py (i.e., Q and Rj are 
30 both active) can be defined as follows: 
S y = F(Sy, B) 

where F is the logical function definded by the op-code operation field in accordance with Table II; Sy is the 
present output of the processor Py and is the first operand; and B is the second operand and selected from 
the inputs to the processor pursuant to the op-code direction field in accordance with Table I. 

35 Referring now to FIG. 3, shown therein is a generalized schematic of a processor Py in accordance with 
the foregoing illustrative example of a 6-bit op-code having a 2-bit direction (selection) field and a 6-bit 
operation field. The processor Py includes a clocked one-bit memory cell 111 which can be implemented 
with a D-type flip-flop, for example. The clock input for the one-bit memory cell is provided by an AND gate 
113 which is responsive to the column and row select signals C b Rj. A logic circuit 115 is responsive to the 

40 output of the memory cell 1 1 1 , the op-code word INST, and the four (4) inputs to the processor. The output 
of the logic unit 115 is the result of the two-operand logical operation performed with the two operands 
comprising (a) the output of the memory cell 111 and (b) one of the inputs to the processor. 

Referring now to FIG. 4, shown therein is a schematic of the processor Py of FIG. 3 showing illustrative 
example implementations of the logic circuit 115 and the one-bit memory cell 111. The logic circuit 115 

45 specifically includes a 4-to-1 multiplexer 211 which receives the 2 bits h. b of the direction field of the op- 
code word INST as its select inputs. The four data inputs to the multiplexer 211 are provided by the N, S, 
E, W inputs to the processor. The output of the multiplexer 211 is one of the N, S, E. W inputs and is the 
second operand B. 

The logic circuit 110 further includes another 4-to-1 multiplexer 213 which receives the output Sy of the 
so memory cell 111 and the output B of the multiplexer 211 as its select inputs. The data inputs to the 
multiplexer 213 are the 4 bits 1 3 . U, b, Is of the operation field of of the op-code word INST. The output of 
the multiplexer is provided to the D-input of a clocked D-type flip-flop 213 which comprises the one-bit 
memory ceil 111. 

Referring now to FIG. 5, shown therein is multiplexer 100 which can be utilized as the 4-to-1 
55 multiplexers 211 and 213 in the processor of FIG. 4. The multiplexer 100 includes first and second inverters 
311, 313 responsive to the select signals CI, C2 for providing complements Cl\ C2\ The select signal C1 
is provided as inputs to three-input AND gates 315, 317, while the complementary select signal Cl' is 
provided as inputs to three-input AND gates 319, 321. The select signal C2 is provided as inputs to the 
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AND gates 315. 319, and the complementary select signal 02 is provided as inputs to the AND gates 317, 
321. The other inputs to the AND gates 315, 317, 319, 321 are provided respectively by data inputs Dl, D2, 
D3. D4. 

For use as the multiplexer 211, the direction field bits 11, 12 are respectively provided as the select 

s inputs C1, C2; and the processor inputs S, N, W. E are respectively provided as the data inputs Dl, D2 ( D3, 
D4. These specific inputs to the multiplexer are indicated parenthetically on RG. 5, and provide the 
operations set forth in Table I above. The output of the multiplexer 211 is the second operand B. 

For use as the multiplexer 213, the operands Sg and B are respectively provided as the select inputs 
CI , C2; and the operation field bits 16, 15, 14, 13 of the op-code are respectively provided as the data inputs 

w Dl, D2, D3, D4. These specific inputs to the multiplexer are indicated parenthetically on RG. 5, and provide 
the operations set forth in Table II above. Essentially, the operation field bit pattern for each different 
operation includes the truth table for that operation. The output of the multiplexer 213 is the new state of the 
processor which will be stored in the processor one-bit memory cell if such cell is selected. 

It should be appreciated that the specific clocking of the processors Pjj via the column and row 

75 selection circuits will depend on the specific implementations of the processors. Thus, for the example of 
clocked D-type flip-flop memory cells, the column and row select signals would be controlled to transition to 
the active state only after the op-code and external inputs are valid (i.e., the op-code word is provided early 
in the clock cycle). Thus, in each clock cycle the selected column and row select signals will transition to 
the active state and then to the inactive state. In this manner, the new state of a processor does not affect 

20 the logical function involving the present output of the processor. 

Although not explicitly shown, it should also be appreciated that initializion of the outputs Sy of the 
processors Py will depend on the particular implementation. For the clocked D-type flip-flop memory cells, 
the outputs can be preset or cleared by separate control lines (not shown) or by defining an op-code which 
forces the outputs of selected processors to be a logical one or zero (e.g., high voltage or low voltage). 

25 With the understanding of the foregoing clocking and initialization considerations, the general operation 
of the mesh processor is as follows. The processors are initialized (e.g., preset, cleared, reset, or set) and 
external data is made available via an input data register. Also, an op-code word, a column select word 
CS ELECT, and a row select word are made available by the controller 20. The selected processors are then 
clocked by the column and row select signals Cj, Rj. The procedure of providing external data, an op-code 

30 word, a column select word, and a row select word are then repeated, and followed by appropriate clocking 
via the column and row select signals C b Rj. The output of the processor array can be provided to an output 
register, for example. 

As discussed previously, only those processors selected by the column and row select signals are 
clocked and can change their output states, depending on the op-code and the states of the operands. The 
35 output states of the processors not selected are not changed. 

Referring now to RGS. 6A-6M, a 3 by 3 processor array having processors that provide the functions 
set forth in Tables I and II, above, will now be discussed relative to the addition of two 3-bit unsigned binary 
integers X, Y stored in the top and middle rows (i.e., rows 3 and 2) of the array, with the least significant 
bits to the right (i.e., column 3 has the least significant bit for each row). The binary integers X, Y can be 
40 loaded Into the rows 3 and 2 by loading X into an input register at the top of the array, copying the contents 
of the register into the row 3 processors, loading Y into the input register, copying the contents of the row 3 
processors into the row 2, and copying the input register contents into the row 3. 

Starting with the initial condition of the integers X, Y in rows 3 and 2 as depicted in RG. 6A for the 
integers 2 and 3, the following Table III sets forth the necessary steps for placing the sum (A + B mod 8) in 
45 row 1 of the array. 



50 
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TABLE III 



70 



Step 


Col. 


Row 


Logical 


Input 


rlu. 








Operation 


Direction 




1 


All 


1 


COPY 


N 


6B 


2 


Ail 


3 


XOR 


S 


6C 


3 


All 


2 


XOR 


N 


6D 


4 


All 


1 


AND 


N 


6E 


5 


All 


2 


OR 


N 


6F 


6 


3 


2 


COPY 


S 


6G 


7 


2 


2 


AND 


E 


6H 


8 


2 


2 


OR 


S 


61 


9 


1,2 


2 


COPY 


E 


6J 


10 


3 


2 


RESET 


None 


6K 


11 


All 


2 


XOR 


N 


6L 


12 


Ail 


1 


COPY 


N 


6M 


(XOR denotes the exclusive OR function) 



The foregoing has been a disclosure of a mesh processor array that utilizes an efficient processor cell, 
can be clocked at higher rates, provides computational flexibility and provides for efficient memory 
utilization. The array architecture readily and efficiently implements defined synchronous logic, for example, 

25 pursuant to appropriate sequences of instructions based on the particular logical functions of such defined 
synchronous logic. And due to the flexibility of the disclosed processor array, the resulting implementation 
of the particular logical functions can be adapted to provide for more efficient and faster processing, for 
example by logic minimization techniques. As a particular example of the flexibility of the disclosed 
processor array, persons skilled in the art will appreciate that existing algorithms designed for known 

30 parallel processor arrays having more memory per processor cell can be implemented with the disclosed 
processor array, for example, by grouping multiple bit cells of the invention for each of the multiple bit 
memory cells. 

Although the foregoing has been a description and illustration of specific embodiments of the invention, 
various modifications and changes thereto can be made by persons skilled in the art without departing from 
35 the scope and spirit of the invention as defined by the following claims. 

Claims 

40 1 . A mesh processor array, characterized by: 

- a plurality of one-bit logic processors (P rj ) arranged in a matrix (30) and providing respective one-bit logic 
outputs; 

- control means (20) for providing a control word to each of said processors (Pq) in parallel and for providing 
a selection signal indicative of selected ones of said processors (P g ); and 

45 - selection means (40, 50) responsive to said selection signal for enabling selected ones of said processors 
(P§) to respond to said control word. 

2. The mesh processor array of Claim 1, characterized in that said plurality of processors (Pg) are arranged 
in columns and rows. 

3. The mesh processor array of Claim 2, characterized in that said selection means (40, 50) comprises a 
so column selection circuit (40) and a row selection circuit (50). 

4. The mesh processor array of any of Claims 1 through 3, characterized in that each of said one-bit 
processors (Pg) comprises: 

. means (111) for storing one-bit data and for providing said one-bit data as the output of the processor 

55 - logic means (115) responsive to said control word, said processor output, and one-bit logic inputs which 
include the one^'it logic outputs of certain adjacent processors (Pg) for providing to said storing means 
(111) a logical output that is the result of a logical function involving said processor output and said logic 
signal inputs as defined by said control word. 
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5. The mesh processor array of Claim 4, characterized in that said logic means (115) provides a logical 
output that is the result of a logical operation of (a) said processor output and (b) one of said logic inputs. 

6. The mesh processor array of Claim 4 or 5 ( characterized in that said storing means (111) comprises a 
docked memory device which is clocked by said selection means (40, 50). 

5 7. The mesh processor array of Claim 6, characterized in that said clocked memory device comprises a 
flip-flop. 

8. The mesh processor array of any of Claims 4 through 7, characterized in that said plurality of processors 
{P^ are arranged in a grid of columns and rows and wherein: 

- the one-bit logic inputs for each processor (Pg) on the perimeter of the grid but not on a comer of the 
w array (30) include an external one-bit input logic signal; 

- the one-bit logic inputs for each processor (Pq) on the comer of the array (30) include two external one-bit 
input logic signals; and 

- the one-bit logic inputs for each processor (Pg) not on the perimeter of the grid include only one-bit logic 
outputs of the othogonally adjacent processors (Pq). 

75 9. The mesh processor array of any of Claims 4 through 8, characterized in that the logic inputs to the 
processor array (30) are provided to the processors (Pg) on the perimeter of said matrix, and wherein the 
outputs of the processor array (30) are provided by the processors on the perimeter of said matrix. 

10. A one-bit processor comprising: 

- means (111) for storing one-bit data and for providing said one-bit data as the output of the processor 
20 (Pg); and 

- logic means (115) responsive to a control word, said processor output, and one-bit logic inputs for 
providing to said storing means (111) a logical output that is the result of a logical function involving said 
processor output and said logic signal inputs as defined by said control word. 

11. The one-bit processor of Claim 10, characterized in that said logic means (115) provides a logical 
25 function of (a) said processor output and (b) one of said logic signal inputs. 

12. The one-bit processor of Claim 10 or 11, characterized in that said storing means (111) comprises a 
clocked memory device. 

13. The one-bit processor of Claim 12, characterized in that said clocked memory device comprises a flip- 
flop. 

30 
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